

1. An apparatus for executing an MMX PSADBW instruction, comprising:

subtractors, for generating packed differences of packed operands of the instruction and for generating carry bits associated with each of the packed differences;

inverters, coupled to said subtraction logic, for generating an inverse of each of said packed differences;

multiplexers, coupled to said inverters and said subtraction logic, each for selecting as an output said packed difference if said associated carry bit indicates the packed difference is positive, and for selecting as said output said inverse if said associated carry bit indicates the packed difference is negative; and

adders, coupled to said multiplexers, for adding said carry bits and said outputs of said multiplexers to generate a result of the instruction.

2. The apparatus of claim 1, further comprising:

an instruction type input, for specifying whether the PSADBW instruction or a multiply instruction is being executed by the apparatus; and

second multiplexers, coupled to said first multiplexers, for providing to said adders said carry bits and said first outputs of said first multiplexers if said instruction type input specifies the PSADBW instruction, and for providing partial products if said instruction type input specifies a multiply instruction.

3. The apparatus of claim 1, wherein said adders comprise:

first and second adders, for adding first and second pluralities of partial products of at least one multiply instruction.

4. The apparatus of claim 3, wherein said adders further comprise:

a third adder, coupled to said first and second adders, for adding first and second sums generated by said first and second adders to generate a result of the PSADBW instruction.

5. The apparatus of claim 4, wherein said third adder is also selectively employed to generate a sum of product results of said at least one multiply instruction.
6. The apparatus of claim 4, wherein said first sum comprises a sum of said carry bits.
7. The apparatus of claim 4, wherein said second sum comprises a sum of said outputs of said multiplexers.
8. The apparatus of claim 1, wherein each of said carry bits comprises a Boolean zero value if said associated packed difference is positive and comprises a Boolean one value if said associated packed difference is negative.
9. The apparatus of claim 1, further comprising:  
a plurality of storage elements, for storing said carry bits.

10. The apparatus of claim 1, wherein said adders add said carry bits and said outputs of said multiplexers substantially in parallel.
11. The apparatus of claim 1, wherein a computer program product comprising a computer usable medium having computer readable program code causes the apparatus, wherein said computer program product is for use with a computing device.
12. The apparatus of claim 1, wherein a computer data signal embodied in a transmission medium comprising computer-readable program code provides the apparatus.

13. A microprocessor for generating a packed sum of absolute differences, comprising:

an instruction translator, for translating an MMX PSADBW macroinstruction into at least first and second microinstructions; and

an MMX unit, coupled to said instruction translator, for generating a result of said PSADBW macroinstruction in response to said at least first and second microinstructions.

14. The microprocessor of claim 13, wherein said MMX unit generates packed differences of said operands in response to said first microinstruction, and generates a sum of absolute values of said packed differences in response to said second microinstruction.

15. The microprocessor of claim 13, wherein said MMX unit comprises:

a plurality of subtractors, for generating said packed differences of said operands.

16. The microprocessor of claim 15, wherein said plurality of subtractors generate said packed differences of said operands in a single microprocessor clock cycle.

17. The microprocessor of claim 15, wherein said plurality of subtractors also generate a sign for each of said packed differences of said operands.

18. The microprocessor of claim 13, wherein said MMX unit comprises:

multiplexing logic, having a microinstruction type control input, wherein if said control input indicates said microinstruction type is of said second microinstruction, then said multiplexing logic selects selectively inverted said packed differences of said operands for providing to an adder as a plurality of addends.

19. The microprocessor of claim 18, wherein each of said packed differences is selectively inverted based on whether said packed difference is positive or negative.

20. The microprocessor of claim 19, wherein said packed difference is inverted if said packed difference is negative and not inverted if said packed difference is positive.
21. The microprocessor of claim 18, wherein if said control input indicates said microinstruction type is not of said second microinstruction, then said plurality of multiplexers select a plurality of partial products from a multiplier for providing to said adder as said plurality of addends.

22. An apparatus for generating a packed sum of absolute differences instruction in a microprocessor having subtraction logic for generating differences of packed bytes in each of a minuend operand and a subtrahend operand of the instruction, the microprocessor also having logic for generating partial products of at least one multiply instruction, the microprocessor also having addition logic for adding the partial products, the apparatus comprising:

a plurality of storage elements, each for storing a sign bit for indicating whether a corresponding one of the differences is positive or negative;

a plurality of multiplexers, coupled to corresponding ones of said plurality of storage elements, each for outputting a value, wherein said value comprises said difference if said sign bit is positive and a complement of said difference if said sign bit is negative; and

multiplexing logic, coupled to said plurality of multiplexers, for selecting the partial products for provision to the addition logic when executing the at least one multiply instruction, and for selecting said sign bits and said values for provision to the addition logic when executing the packed sum of absolute differences instruction.

23. The apparatus of claim 22, wherein the addition logic adds said sign bits and said values substantially in parallel.
24. The apparatus of claim 22, wherein said sign bits and said values comprise at least 16 addends added by the addition logic.

25. A method for executing an MMX PSADBW instruction, comprising:

generating packed differences of packed operands of the instruction and generating carry bits associated with each of the packed differences;

for each of the packed differences, determining whether the carry bit indicates the packed difference is positive or negative;

for each of the packed differences, selecting a value in response to said determining, said value comprising the packed difference if the associated carry bit is positive and a complement of the packed difference if the associated carry bit is negative; and

adding the values selected and the carry bits to generate a result of the instruction.

26. The method of claim 25, wherein said adding comprises:

adding the carry bits to generate a first sum;

adding the values to generate a second sum; and

adding the first and second sums to generate the result.

27. The method of claim 25, further comprising:

determining whether the PSADBW instruction or a multiply instruction is being executed;

said adding the values selected and the carry bits if the PSADBW instruction is being executed; and

adding partial products if the multiply instruction is being executed.

28. The method of claim 25, further comprising:

storing the carry bits, after said generating the carry bits.

29. The method of claim 25, further comprising:

translating the PSADBW instruction into at least first and second microinstructions, prior to said generating.

30. The method of claim 29, further comprising:

said generating in response to said first microinstruction; and

said      adding      in      response      to      said      second  
      microinstruction.

31. The method of claim 25, wherein said selecting is performed in parallel for the packed differences.

32. The method of claim 25, wherein said adding is performed in parallel.

33. A computer data signal embodied in a transmission medium, comprising:

computer-readable program code for providing an apparatus for executing an MMX PSADBW instruction, said program code comprising:

first program code for providing subtractors, for generating packed differences of packed operands of the instruction and for generating carry bits associated with each of the packed differences;

second program code for providing inverters, coupled to said subtraction logic, for generating an inverse of each of said packed differences;

third program code for providing multiplexers, coupled to said inverters and said subtraction logic, each for selecting as an output said packed difference if said associated carry bit indicates the packed difference is positive, and for selecting as said output said inverse if said associated carry bit indicates the packed difference is negative; and

fourth program code for providing adders, coupled to said multiplexers, for adding said carry bits and said outputs of said multiplexers to generate a result of the instruction.